2022-05-10

Article introduction

Title: “Peripheral Blood Mitochondrial DNA Copy Number Is Associated with Prostate Cancer Risk and Tumor Burden”

Authors: Weimin Zhou, Min Zhu, Ming Gui, Lihua Huang, Zhi Long, Li Wang, Hui Chen, Yinghao Yin, Xianzhen Jiang, Yingbo Dai, Yuxin Tang, Leye He, Kuangbiao Zhong

Goal: Determine if mtDNA is a predictor for prostate cancer

Flowchart for project flow

Data set overview

Loading

  • Dimensions of the raw data set: 392, 13

  • Stratified on Controls and prostate cancer cases (attribute called Group)

  • Purpose of article: Predict prostate cancer from other variables, mainly mtDNA

Cleaning

  • Check for duplicates

  • Filter for PCRsuccess

  • New dimensions: 387, 13

Augmenting

  • BMI- and DFI-classifier

  • New columns based on TNM-notation

  • Add “Group” as strings

  • New dimensions: 387, 18

Boxplot with continuous variables, any outliers?

<<<<<<< HEAD

Boxplot with discrete variables, any outliers?

======= <<<<<<< HEAD

Boxplot with discrete variables, any outliers?

======= <<<<<<< HEAD

Boxplot with discrete variables, any outliers?

=======

Boxplot with discrete variables, any outliers?

>>>>>>> 645711b9af808fe33757dde61e6a08f8d879a367 >>>>>>> be997d15c05205a43b73d949b4a25ed1fa5a9838 >>>>>>> 71e558e4b617711cf8a0dcc420766c799847c3ed

Re-creating plot from the article

Article visualizationArticle visualization

Article visualization

A better biomarker for prostate cancer?

<<<<<<< HEAD

======= <<<<<<< HEAD

======= <<<<<<< HEAD

=======

>>>>>>> 645711b9af808fe33757dde61e6a08f8d879a367 >>>>>>> be997d15c05205a43b73d949b4a25ed1fa5a9838 >>>>>>> 71e558e4b617711cf8a0dcc420766c799847c3ed

Logistic regression, excl. PSA

Significant p-values:
Maybe the distribution of Dfi-classes are skewed?

Logistic regression, incl. PSA

Significant p-values:

Principal component analysis (PCA)

PCAPCAPCA

PCA

Interesting finding during exploratory data analysis

Some more data exploration

Conclusion

  • We can support the conclusion of the article, mtDNA is a biomarker for prostate cancer (e.g, it is reproducible)
  • PSA levels seem to be an even better biomarker
  • Both of the above could be supported by logistic regression
  • Conclusion for PCA?
  • Some further research? Should be possible to do classification on Gleason scores/AJCC, should also be possible to do regression (albeit out of the scope of this course)